Diagnosing problems with imputation models using the Kolmogorov-Smirnov test: a simulation study
نویسندگان
چکیده
BACKGROUND Multiple imputation (MI) is becoming increasingly popular as a strategy for handling missing data, but there is a scarcity of tools for checking the adequacy of imputation models. The Kolmogorov-Smirnov (KS) test has been identified as a potential diagnostic method for assessing whether the distribution of imputed data deviates substantially from that of the observed data. The aim of this study was to evaluate the performance of the KS test as an imputation diagnostic. METHODS Using simulation, we examined whether the KS test could reliably identify departures from assumptions made in the imputation model. To do this we examined how the p-values from the KS test behaved when skewed and heavy-tailed data were imputed using a normal imputation model. We varied the amount of missing data, the missing data models and the amount of skewness, and evaluated the performance of KS test in diagnosing issues with the imputation models under these different scenarios. RESULTS The KS test was able to flag differences between the observations and imputed values; however, these differences did not always correspond to problems with MI inference for the regression parameter of interest. When there was a strong missing at random dependency, the KS p-values were very small, regardless of whether or not the MI estimates were biased; so that the KS test was not able to discriminate between imputed variables that required further investigation, and those that did not. The p-values were also sensitive to sample size and the proportion of missing data, adding to the challenge of interpreting the results from the KS test. CONCLUSIONS Given our study results, it is difficult to establish guidelines or recommendations for using the KS test as a diagnostic tool for MI. The investigation of other imputation diagnostics and their incorporation into statistical software are important areas for future research.
منابع مشابه
Application of the Kolmogorov-Smirnov Test to Estimate the Threshold When Estimating the Extreme Value Index
The Pareto distribution model assumption in the peaks over threshold method, will be tested by making using of the Kolmogorov-Smirnov goodness of fit method. Pareto distributed variables can be transformed to exponential, and the test will be for exponentiality. It was found that the statistic can be used as an indication of where to choose the threshold and to check the Pareto model assumption.
متن کاملA Modified Kolmogorov-Smirnov Test for Normality
In this paper we propose an improvement of the Kolmogorov-Smirnov test for normality. In the current implementation of the Kolmogorov-Smirnov test, a sample is compared with a normal distribution where the sample mean and the sample variance are used as parameters of the distribution. We propose to select the mean and variance of the normal distribution that provide the closest fit to the data....
متن کاملPower comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests
The importance of normal distribution is undeniable since it is an underlying assumption of many statistical procedures such as t-tests, linear regression analysis, discriminant analysis and Analysis of Variance (ANOVA). When the normality assumption is violated, interpretation and inferences may not be reliable or valid. The three common procedures in assessing whether a random sample of indep...
متن کاملKOLMOGOROV-SMIRNOV TEST TO TACKLE FAIR COMPARISON OF HEURISTIC APPROACHES IN STRUCTURAL OPTIMIZATION
This paper provides a test method to make a fair comparison between different heuristics in structure optimization. When statistical methods are applied to the structural optimization (namely heuristics or meta-heuristics with several tunable parameters and starting seeds), the "one problem - one result" is extremely far from the fair comparison. From statistical point of view, the minimal requ...
متن کاملSmall Improvement to the Kolmogorov-Smirnov Test
The Kolmogorov-Smirnov (K-S) test is widely used as a goodness-of-fit test. This thesis consists of two parts to describe ways to improve the classical K-S test in both 1-dimensional and 2-dimensional data. The first part is about how to improve the accuracy of the classical K-S goodness-of-fit test in 1-dimensional data. We replace the p-values estimated by the asymptotic distribution with nea...
متن کامل